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Abstract 

We study the return interval r between price volatilities that are above a certain threshold q 
for 31 intraday datasets, including the Standard & Poor's 500 index and the 30 stocks that form 
the Dow Jones Industrial index. For different threshold q, the probability density function P q {r) 
scales with the mean interval r as P q (r) = r _1 /(r/f), similar to that found in daily volatilities. 
Since the intraday records have significantly more data points compared to the daily records, 
we could probe for much higher thresholds q and still obtain good statistics. We find that the 
scaling function f(x) is consistent for all 31 intraday datasets in various time resolutions, and the 
function is well approximated by the stretched exponential, f(x) ~ e~ ax "' , with 7 = 0.38 ±0.05 and 
a = 3.9 ± 0.5, which indicates the existence of correlations. We analyze the conditional probability 
distribution P 9 (t|to) for r following a certain interval To, and find P 9 (t|to) depends on tq, which 
demonstrates memory in intraday return intervals. Also, we find that the mean conditional interval 
(t\tq) increases with To, consistent with the memory found for P g (r|ro). Moreover, we find that 
return interval records have long term correlations with correlation exponents similar to that of 
volatility records. 
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Statistical properties of price fluctuations 
very important to understand and model financial market dynamics, which has long been 
a focus of economic research. Stock volatility is of interest to traders because it quantifies 
risk, optimizes the portfolio and provides a key input of option pricing models 



that are based on the estimation of the volatility of the asset 
logarithmic changes of stock price from time t — 1 to time t, 
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ute values are known to be long-term power-law correlated 
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321 ] . The probability density function (pdf) of G(t) 



possesses a power-law distribution 



(2) 



with C~3|9,32,33,34]. Also, n q (t), the number of times that the volatility \G(t)\ exceeds 
a threshold q, follows a power-law in the time t after a market crash, 

n q (t) ~ t~ p , (3) 

with p « 1 j^j]. Eq. (3) is the financial analog of the Omori earthquake law j^. 

Recently Yamasaki et al. [stJ studied the behavior of return intervals r between volatil- 
ities that are above a certain threshold q [illustrated in Fig. 1(a)]. They analyzed daily 
financial records and found scaling and memory in return intervals, similar to that found in 
climate data 



381 ] . To investigate the generality of these statistical features of Ref. |37[, here 
we examine 31 intraday datasets. We find that similar scaling and memory behavior occurs 
at a wide range of time resolutions (not only on the daily scale). Due to the larger size 
of the datasets we analyze, we are able to extend our work to significantly larger values of 
q. Remarkably, scaling functions are well approximated by the stretched exponential form, 
which indicates long range correlations in volatility records 3^|. Also, we explore clusters 
of short and long return intervals, and find that the larger is the cluster the stronger is the 
memory. 
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II. DATABASES ANALYZED 



We analyze the trades and quotes (TAQ) database from New York Stock Exchange 
(NYSE), which records every trade for all the securities in United States stock market 
for the two-year period from January 1, 2001 to December 31, 2002, a total of 497 trading 
days. We study all 30 companies of the Dow Jones Industrial Average index (DJIA). The 
sampling time is 1 minute and the average size is about 160,000 values per DJIA stock. An- 
other database we analyze is the Standard and Poor's 500 index (S&P 500), which consists 
of 500 companies. This database is for a 13-year period, from January 1, 1984 to December 
31, 1996, with one data point every 10 minutes (total data points is about 130,000). For 
both databases, the records are continuous in regular open hours for all trading days, due 
to the removal of all market closure times. 



III. VOLATILITY DEFINITION 



In contrast to daily volatilities, the intraday data are known to show specific patterns 
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due to different behaviors of traders at different periods during the trading day. 
For example, the market is very active immediately after the opening 26|, due to infor- 
mation arriving while the market is closed. To understand the possible effect on volatility 
correlations, we investigate the daily trend in DJIA stocks. The intraday pattern, denoted 
as A(s) [32], is defined as 

A( S) s ay i , ( 4) 

which is the return at a specific moment s of the day averaged over all N trading days, and 
G l (s) is the price change at time s in day i. As shown in Fig. 1(b), the intraday pattern 
A(s) has similar behavior for the four stocks AT&T, Citi, GE, IBM and the average over 
30 DJIA stocks. The pattern is not uniformly distributed, exhibiting a pronounced peak at 
the opening hours and a minimum around time s = 200 min, that may cause some artificial 
correlations. To avoid the effect of this daily oscillation, we remove the intraday pattern by 
studying 

G'(t) = \G(t)\/A(s). (5) 
In order to compare different stocks, we define the normalized volatility g(t) by dividing 
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G'(t) with its standard deviation, 



g(t) 



G'(t) 



(6) 



{(G'(t)*) - (G>Qt))*)W 



where (...) is the time average for each separate stock. Consequently, the threshold q is 
measured in units of the standard deviation of G'(t). As shown in Fig. 1(a), every volatility 
g(t) above a threshold q ("event") is picked and the series of the time intervals between 
those events, {r(q)}, is generated. The series depends on the threshold q. To maintain good 
statistics and avoid spurious discreteness effects j^, we restrict ourselves to thresholds with 
average intervals r = f(q) > 3 time units (30 minutes for the S&P 500 and three minutes 
for the 30 stocks of the DJIA). 

IV. SCALING PROPERTIES 

We study the dependence of P q {r) on q , where P q {j) is the pdf of the return interval 
series {r(q)}. Figure 2 shows results for the S&P 500 index and for two typical DJIA stocks, 
Citi and GE. The time window At of volatility records is 1 minute for the DJIA stocks and 
10 minutes for the S&P 500. The left panels of Fig. 2 [(a), (c), (e)], show that the pdf P q {r) 
for large q decays slower than for small q. The right panels of Fig. 2 [(b), (d), (f)] show the 
scaled pdf P q {r)f as a function of the scaled return intervals r/f. The five curves for q = 2, 
3, 4, 5 and 6 collapse onto a single curve. Thus the distribution functions follow the scaling 

relation y,y 




We also study the other 28 DJIA stocks and find that they have similar scaling behavior for 
different thresholds. 

To examine the scaling for larger thresholds with good statistics, we calculate the return 
intervals of each DJIA stock, and then aggregate all the data. As shown in Fig. 2(g) and 
(h), the scaling behavior extends even to q — 15. In Eq. (7), the scaling function f(r/f) 
does not directly depend on the threshold q but only through f = f(q). Therefore, if P q {r) 
for an individual value of q is known, distributions for other thresholds can be predicted 
by the scaling Eq. (7). In particular, the distribution of rare events (very large q, such as 
market crashes) may be extrapolated from smaller thresholds, which have enough data to 
achieve good statistics. 
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f(r/f). 



(7) 



r 
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Next, we investigate the similarity of scaling functions for different companies. Scaled 
pdfs P q (r)f with q = 2 for return intervals (upper symbols) are plotted in Fig. 3(a), showing 
the S&P 500 index and 30 D JIA stocks in alphabetical order of names (one symbol represents 
one dataset). We see that the pdf curves collapse, so their scaling functions f(x) are similar, 
consistent with a universal structure for P q {r). As suggested by the line on upper symbols 
in Fig. 3(a) and on the filled symbols in Fig. 4, the function f(x) may follow a stretched 
exponential form 

f(x) ~ e~ a *\ (8) 

Remarkably, we find that all 31 datasets have similar exponent values, and conclude that 7 
appears to be "universal" , with 

7 = 0.38 ±0.05. (9) 
The value a is found to be almost the same for all datasets, 

a = 3.9 ±0.5. (10) 

Further, we plot the stretched exponential fit for four companies, AT&T, Citi, GE and IBM 
in a log-linear plot [Fig. 3(b)]. We find good fits for all four companies, and we also find 
good collapse for different thresholds for each stock. The scaling function clearly differs from 
the Poisson distribution for uncorrelated data, f(x) ~ e~ x , which is demonstrated by curves 
on lower symbols in Fig. 3(a). 

For statistical systems, the time resolution of records is an important aspect. The system 
may exhibit diverse behavior in different time windows At. In Fig. 4 we analyze five time 
scales for four typical companies (q = 2): (a) AT&T, (b) Citi, (c) GE and (d) IBM. It is 
seen that for At = 1, 5, 10, 15, and 30 minutes, the P q (r)f curves collapse onto one curve, 
which shows the persistence of the scaling for a broad range of time scales. Thus there seems 
to be universal structure for stocks not only in different companies, but also in each stock 
with various time resolutions. 

To understand the origin of the scaling behavior in return intervals, we analyze pdfs 
of the volatility after shuffling (in order to remove correlations in the volatility records 

Hn 
37|). For uncorrelated data, as expected, a Poisson distribution is obtained, shown by 

the lower symbols in Fig. 3(a) and empty symbols in Fig. 4. In contrast to the distribution 

for uncorrelated records, the distribution of the actual return intervals (the upper symbols in 
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Fig. 3(a) and filled symbols in Fig. 4) is more frequent for both small and large intervals, and 
less frequent in intermediate intervals. The distinct difference between the distributions of 
return intervals in the original data and shuffled records suggests that the scaling behavior 
and the form in Eq. (8) must arise from long-term correlations in the volatility (see also 



V. MEMORY EFFECTS 



The sequence of return intervals may, or may not, be fully characterized by P q (r), de- 
pending on the time organization of the sequence. If the sequence of return intervals are 
uncorrelated, they are independent of each other and totally determined by the probability 
distribution. On the other hand, if the intervals are correlated, the memory will also affect 
the order in the sequence of intervals. 

To investigate the memory in the records, we study the conditional pdf, P q (r\To), which 
is the probability of finding a return interval r immediately after a return interval of size 
t . In records without memory, P q (r\T ) should be identical to P q (r) and independent of 
To. Otherwise, it should depend on tq. Due to the poor statistics for a single tq, we study 
Pq{ T \ T o) f° r a bin (range) of r . The entire database is partitioned into 8 equal-size bins with 
intervals in increasing length. Fig. 5 shows P ? (t|to) for To in the smallest (solid symbols) 
and largest (open symbols) subset of the four stocks AT&T, Citi, GE and IBM. For tq in the 
lowest bin the probability is larger for small r, while for r in the largest bin the probability 
is higher for large r. Thus, large r tend to be followed by large r, while small To tend to be 
followed by small r ("clustering"), which indicates memory in the return interval sequence. 
Thus, long-term correlations in the volatility records affect the pdf of intervals as well as 
the time organization of r. Note also that P ? (t|to) for all thresholds seems to collapse onto 
a single scaling function for each of the r subsets. 

Further, the memory is also seen in the mean conditional return interval (t|t ), which is 
the first moment of P q (r\To), immediately after a given r subset. Filled symbols in Fig. 6 
show again that large r tend to follow large To, and small r follow small r , similar to the 
clustering in the conditional pdf P g (r|ro). Correspondingly, shuffled data (empty symbols) 
exhibits a flat shape, demonstrating that the value of r is independent on the previous 
interval tq. 
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The quantities P q (r\ro) and (t|to) show memory for intervals that immediately follow an 
interval tq, which indicates short-term memory in the return interval records. To study the 
possibility that the long-term memory exists in the return intervals sequence, we investigate 
the mean return interval after a cluster of n intervals, all within a bin r . To obtain good 
statistics we divide the sequence only into two bins, separated by the median of the entire 
database. We denote intervals that are above the median by "+", and that are below the 
median by "-" . Accordingly, n consecutive "+" or "-" intervals form a cluster and the mean 
of the return intervals after such n-clusters may reveal the range of memory in the sequence. 
Fig. 7 shows the mean return intervals {t\tq)/t vs. the size n, where tq in (t\tq)/t refers 
to a cluster with size n. For "+" clusters, the mean intervals increase with the size of the 
cluster, the opposite of that for "-" clusters. The results indicate long-term memory in the 
sequence of r since we do not see a plateau for large clusters. 

To further test the range of long-term correlations in the return interval time series, we 
apply the detrended fluctuation analysis (DFA) method j^j, 41, 3|. After removing trends, 



the DFA method computes the root-mean-square fluctuation F(£) of a time series within 
windows of £ points, and determines the correlation exponent a from the scaling function 
F(£) ~ £ a . The exponent a is related to the autocorrelation function exponent 7 by 

a = 1-7/2, (11) 



and autocorrelation function C(t) ~ t -7 where < 7 < 1 38j, |43j]. When a > 0.5, the time 
series has long-term correlations and exhibits persistent behavior, meaning that large values 
are more likely to be followed by large values and small values by small ones. The value 
a = 0.5 indicates that the signal is uncorrelated (white noise). 

We analyze the volatility series and the return interval series by using DFA method. 
The results of S&P 500 index and 30 DJIA stocks for two regimes (split by £* = 390 for 
volatilities and £* = 93 for return intervals, which corresponds to 1 day in time scale) are 
shown in Fig. 8 [40j. We see that a values are distinctly different in the two regimes, and 
both of them are larger than 0.5, which indicates long-term correlations in the investigated 
time series but they are not the same for different time scales. For large scales (£ > £*), 
a = 0.98 ± 0.04 for the volatility (group mean± standard deviation) and a = 0.92 ± 0.04 
for the return interval are almost the same, and the differences are within the error bars. 

nrl 



These results are consistent with Refs. |32j, |37J for a of the volatilities, and with Ref. [37] for 



a of the return intervals. For short scales (£ < £*), we find a = 0.66 ± 0.01 for the volatility 
(consistent with Ref. 3^) and a = 0.64 ± 0.02 of the return intervals, and the differences 
are again in the range of the error bars. Here error bars refer to that of each dataset, not 
the standard deviation of a group for 31 datasets, and average error bars ~ 0.06. Similar 
crossover from short scales to large scales with similar values of a have been also observed 
for intertrade times by Ivanov et al. Q|. Such behavior suggests a common origin for 
the strong persistence of correlations in both volatility and return interval records, and in 
act the clustering in return intervals is related to the known effect of volatility clustering 

3,0,0- 



VI. DISCUSSION AND CONCLUSION 



The value of 7 ~ 0.4 could be a result of 7 = 2 — 2a from Eq. (11), where a ~ 0.8 is the 
average of the two a regimes that we observe (see Fig. 8). It is possible for the value of 7 
to be different for small and large q values. The reason for this differences is that for small 
q the low volatilities are probed and therefore the time scales are controlled by a ~ 0.65 
(below the crossover), while for the large q the high volatilities are probed, which represent 
large time scales (above the crossover), controlled by a ~ 0.95. We will undertake further 
analysis to test this possibility. 

In summary, we studied scaling and memory effects in volatility return intervals for 
intraday data. We found that the distribution function for the return intervals can be 
well described by a single scaling function that depends only on the ratio of r/f for DJIA 
stocks and S&P 500 index, for various time scales ranging from short term At = 1 minute 
to At = 30 minutes. The scaling function, which results from the long-term correlations 
in the volatility records, differs from the Poisson distribution for uncorrelated data. We 
found that the scaling function can be well approximated by the stretched exponential form, 
f(x) ~ e ~ ax "' w ith >y = 0.38 ± 0.05 and a = 3.9 ± 0.5. We showed strong memory effects 
by analyzing the conditional pdf P 9 (t|to) and mean return interval (t|t ). Furthermore, 
we studied the mean interval after a cluster of intervals, and found long-term memory for 
both clusters of short and long return intervals. We demonstrated by the DFA method 
that the volatility and return intervals have long-term correlations with similar correlation 
exponents. 
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FIG. 1: (a) Illustration of volatility return intervals for a volatility time series for IBM on May 
10, 2002. Return intervals T3 and T2 for two thresholds q = 3 and 2 are displayed, (b) The 5-min 
interval intraday pattern for AT&T, Citi, GE, IBM and the average over 30 DJIA stocks. The time 
s is the moment in each day, while A(s) is the mean return over all trading days. Note that all 
curves have a similar pattern, such as a pronounced peak after the market opens and a minimum 
around noon (s ~ 200 min). 
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FIG. 2: Distribution and scaling of return intervals for (a <fe b) Citi, (c & d) GE, (e & f) S&P 
500 and (g & h) mixture of 30 DJIA stocks (for very large thresholds). Symbols are for different 
threshold q, as shown in (c) for (a) to (f) and shown in (g) for (g) and (h). The sampling time for 
S&P 500 is 10 minutes, and for the stocks is 1 minute. For one dataset, the distributions Pq(r) are 
different with different q, but they collapse onto a single curve for P q (r)f vs. r/f (f is the mean 
interval), which indicates a scaling relation, (g) and (h) show that the scaling can extend to very 
large thresholds. 
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FIG. 3: (a) Scaling of return intervals for all 30 DJIA stocks and S&P 500 index. Scaled distribution 
function P q (r)f vs. r/f with threshold q = 2 for actual return intervals, as well as for the shuffled 
volatility records (divided by 10) are shown. Every symbol represents one stock. The line on 
the symbols for original records suggests a stretched exponential relation, f(x) ~ e~ ax ' with 
7 ~ 0.38 ± 0.05 and a ~ 3.9 ± 0.5, while the curve fitting the shuffled records is exponential, 
y = e~ bx , from a Poisson distribution. Note that all the datasets are consistent with a single 
scaling relation. A Poisson distribution indicates no correlation in shuffled volatility data, but 
the stretched exponential behavior indicates strong correlation in the volatilities (see 38(]). (b) 



Stretched exponential fit for AT&T, Citi, GE and S&P 500 all with 7 ~ 0.4. Each stock is well 
approximated by stretched exponential for diverse thresholds, q = 2 3, 4, 5 and 6, presented in the 
plot. Each plot is shifted by xlO for clarity. 
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FIG. 4: Scaling for different time windows, At = 1,5, 10, 15 and 30 min. Plots display scaled pdf 
P q (r)f with threshold q = 2 for volatility return intervals (filled symbols) and shuffled volatility 
records (shifted by factor 10, open symbols) vs. r/f of (a) AT&T, (b) Citi, (c) GE and (d) IBM. 
Each symbol represents one scale At, as shown in (a). Similar to Fig. 2 and Fig. 3, curves fall 
onto a single line for actual return intervals and shuffled data respectively, which indicates the 
scaling relation in Eq. (6). Also, the actual return intervals suggest a stretched exponential scaling 
function, demonstrated by the line fitting the solid symbols. The stretched exponential is a result 
of the long-term correlations in the volatility records. The shuffled volatility records display no 
correlation, indicated by the good fit (solid line) to the Poisson distribution. 



15 




FIG. 5: Scaled conditional distribution P q (r\To)f vs. r/f for (a) AT&T, (b) Citi, (c) GE and (d) 
S&P 500. Here tq represents binning of a subset which contains 1/8 of the total number of return 
intervals in increasing order. Lowest 1/8 subset (solid symbols) and largest 1/8 subset (empty 
symbols) are displayed, which have different tendency, as suggested by black curves. Symbols are 
plotted for different thresholds, denoted in (a). In contrast to the largest subset, the lowest bin 
has larger probability for small intervals and smaller probability for large values, which indicates 
memory in records: small intervals tend to follow small ones and large intervals tend to follow large 
ones. Solid curves on symbols are stretched exponential fits. 
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FIG. 6: Scaled mean conditional return interval (t\tq)/t vs. tq/t for (a) AT&T, (b) Citi, (c) GE 
and (d) S&P 500. The {t\tq)/t of intervals (filled symbols) and shuffled records (open symbols) 
are plotted. Five thresholds, q = 2.0, 2.5, 3.0, 3.5 and 4.0 are represented by different symbols, as 
shown in (a). The distinct difference between actual intervals and shuffled records implies memory 
in the original interval records. 
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FIG. 7: Memory in return interval clusters, to represents a cluster of intervals, consisting of n 
consecutive values that all are above (denoted as "+") or below ("-") the median of the entire 
interval records. Plots display the scaled mean interval conditioned on a cluster, (t\tq)/t, vs. the 
size n of the cluster for (a) AT&T, (b) Citi, (c) GE and (d) S&P 500. One symbol shows one 
threshold q, as shown in (c). The upper part of curves is for "+" clusters while the lower part is 
for "-" clusters. The plots show that "+" clusters are likely to be followed by large intervals, and 
"— " clusters by small intervals, consistent with long-term memory in return interval records. 
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FIG. 8: Root mean square fluctuation -F(^) for (a) volatility records and (b) return interval records 
(q = 2) obtained by the DFA method. Four companies are shown, AT&T, Citi, GE and IBM 
(each shifted by factor of 10). The range of window size is split by vertical dashed lines, £* = 390 
for volatilities (sampled each minute) and £* = 93 for return intervals, both corresponding to a 
time window of one day. The two regimes have different correlation exponents, as indicated by 
the straight lines, (c) Correlation exponent a for 30 DJIA stocks and S&P 500 index (related 
stock names are shown in x-axis). Volatility (circles) and return interval (squares) of large and 
smaller scales are shown. Note that most companies have smaller exponent for intervals than for 
volatilities, but their differences still are in the range of the error bars. Shuffled records (diamonds) 
possess a values around 0.5 that indicate no correlation. Large scales (a = 0.98 ± 0.04 and 
a = 0.92 ± 0.04, group averageistandard deviation for volatilities and intervals respectively) and 
small scales (a = 0.66 ± 0.01 and a = 0.64 ± 0.02 correspondingly) show different correlations for 
different scales, since a > 0.5. 
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